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ABSTRACT A detailed quantitative kinetic model for the Polymerase 
Chain Reaction (PGR) is developed, which allows us to predict the probability 
of replication of a DNA molecule in terms of the physical parameters involved 
in the problem. The important issue of the determination of the number of 
PGR cycles during which this probability can be considered to be a constant is 
solved within the framework of the model. New phenomena of multi-modality 
and scaling behavior in the distribution of the number of molecules after a 
given number of PGR cycles are presented. The relevance of the model for 
quantitative PGR is discussed, and a novel quantitative PGR technique is 
proposed. 
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I. INTRODUCTION 



The Polymerase Chain Reaction (PGR) is one of the most widely used techniques in 
modern molecular biology. It was devised as a method for amplifying specific DNA 
sequences (targets), and the scope of its applications stretches from medicine 0, through 
in vitro evolution to molecular computers @J§]. In spite of its ubiquity in biology, 
theoretical discussions of PGR are rare. Although kinetic models of the enzyme-mediated 
polymerization of single-stranded DNA have been reported none of them were applied 
to model PGR, and only recently a treatment of the rate of mutations arising in PGR has 
been considered PJTOH. 

The main object of our study is the probability that one molecule will be replicated in 
one PGR cycle, the so-called efficiency p. In Section II we present a detailed kinetic model 
of the polymerization and find p as a function of the physical parameters of the problem. 
This allows us to discuss the range of validity of the assumption of constant probability of 
replication, on which statistical considerations have been based P,pi]|. Within that range, 



we apply the theory of branching processes in Section III, to show the existence of new 
phenomena: the probability density function (pdf) of the number of molecules after a given 
number of cycles of PGR displays scaling behavior, and under some conditions, this pdf is 
multi-modal. In Section IV a novel method for quantitative PGR is presented, based on the 
statistical considerations of the previous sections. In Section V we summarize our work. 

One cycle of PGR consists of three steps. (For a more detailed account of the PGR tech- 
nique see, e.g., In the denaturing step, the two strands of the parent DNA molecule in 
solution are separated into single-stranded (ss) templates by rising the temperature to about 
95°C to disrupt the hydrogen bonds. In the annealing step, the solution is cooled down to 
approximately 50°C to allow the primers, present in a high concentration, to hybridize with 
the ss DNA. The primers are two (different) 20 to 30 nucleotides long molecules which are 
Watson-Grick complementary to the 3' fianking extreme of the templates. Once the primer- 
template heteroduplex is formed, it acts as the initiation complex for the DNA-polymeras^ 



^Polymerases are interesting pieces of machinery |12|. They are responsible for the duplication of 
genetic information (DNA-polymerases) and its transcription into RNA (RNA-poIymerases) . 
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to recognize and bind to. This step is crucial for the specificity of the amphfication: only 
those molecules that have sequences complementary to the primers will be amplified. The 
last step is a polymerization reaction, in which the solution is heated to 72°C, the optimal 
working temperature for Thermus Aquaticus DNA polymerase. This enzyme catalyzes the 
binding of complementary nucleotides to the template, in the direction that goes from the 
primer to the other extreme]^. Notice that if this polymerization proceeded to its end, at 
the end of the third step we would have twice as many DNA molecules as we had at the 
beginning of Step 1. These three steps constitute one cycle of the PGR, which is usually 
30 seconds to 2 minutes long. The cycles are repeated a number of times (typically 30) by 
varying the temperature in the solution, in such a way that the DNA molecules that were 
synthesized in a given cycle are used as templates in the following one. In this way one gets 
an extremely efficient amplification mechanism for DNA. 

II. KINETIC MODEL 

We will represent the last two steps of a typical cycle of PGR by means of a kinetic 
model. Our species will be the primers {pr, of length Lp nucleotides), the ss DNA (ss, 
consisting of Lp + nucleotides), the heteroduplexes {hi, formed by one complete ss and 
the partially assembled complementary strand consisting of the primer and the next i nu- 
cleotides), the nucleotides (n, which will be considered identical), the polymerase (g) and 
the heteroduplexes hi with the polymerase attached to them (qhi). Denoting by K,2j-i and 
K2j the forward and backward chemical reaction rates, the chemical equations are 

Step 2 { 
Step 3 < 



^The DNA is a polar molecule, and the polymerase can only attach new nucleotides to the 3' end 
of the molecule that is being extended. 



ss + pr 


K1^2 


ho 


hi + q 




qhi 




qhi + n 


«5^6 


qhi+i 










hN + q 




qhN- 



0<i< N -1 
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Other recognizable species might be present in the chain reaction. This can occur because 
of substitutions, additions or deletions of nucleotides by the polymerase, or because of the 
presence of sequence-dependent structures. These will not be taken into account in our 
model, for the sake of simplicity. Assuming that the effects of inhomogeneities in density 
and temperature are irrelevant, it is well known that Eqs. lead to a corresponding system 
of first order nonlinear differential equations for the concentrations of the different species as 
functions of time, which we are not going to write here (see, for example, [l^). In the above 
reactions, one should assign a given duration to Step 2 and another to Step 3. For the sake 
of simplicity, however, we shall consider both Step 2 and Step 3 as running simultaneously 
in the simulations to be presented below. This is a mild simplification which does not alter 
the conclusions to be drawn. 

The definition of the efficiency p implies that it is simply the ratio between the number 
of ss molecules that were completely replicated at the end of a given cycle and the initial 
number of ss molecules in that cycle: 

... _ [hNm + [qhj^m 

H(o) 

Figure 1 shows plots of the probability of replication p as a function of time t (which is to be 
interpreted as the duration of Step 3 in a typical PGR cycle), for different polymerization 
lengths A^. Since to the best of our knowledge the chemical reaction constants k's have 
not been measured for Taq polymerase, we have assumed some values for these constants to 
exemplify the principal characteristics of our model. It should be stressed, however, that the 
equivalent to some of these constants have been measured for other polymerases such as T4 
polymerase 0, T7 polymerase [0] and for DNA polymerase I (Klenow fragment) 0. The 
values of the chemical reaction constants k and the initial conditions used in the simulation 
of Eq. (|l]) are detailed in the caption to Fig. 1. The main features of the curves in Fig. 1 can 
be quantitatively understood. It can be observed that the larger A^, the flatter the behavior 
at small times. Indeed it can be shown from the dynamic equations that p ~ 
for t sufficiently small. The time at which p has reached about half its asymptotic value, 
as well as the width of the rise-time can be estimated from a further simplification of our 
model. Assuming that the time constants associated with the backwards reactions in Eq. (|1]) 
are large enough, and that the concentration of primers, polymerase and nucleotides are 
sufficiently large that their relative concentrations can be considered as constants (or more 



precisely as slowly varying parameters) during the process, we can rewrite the reaction as 

pr q n n 

I 1^1 J Ik3 7 its 7 -Lk5 7 /0\ 

ss — s> Aio — > qho QiT'i ■ ■ ■ ^ qiiN- [p) 

The time r needed for this reaction to be completed is simply the sum of the times corre- 
sponding to each link of the chain, r = r^^ + r^g + Tk^^i + ... + T^jg tv, where t^^ and are the 
times associated with the first two reactions in Eq. (|^), and r^gj is the time associated with 

the reaction qhj^i qhj. These r's are independent, exponentially distributed random vari- 
ables, whose mean values are (r^i) = {i^i[pr])~^, (r^g) = (K3[g])~\ and {Tf^^j) = {^^[11])^^. 
Therefore, it can be readily seen that the mean and the standard deviation of r are 



1 1 1 



1/2 



where we used that the variance of a exponentially distributed random variable is the square 
of its mean, (r) and ar can be used as estimates of mean rise-time and the rise-time width 
about the mean for the complete reaction. These estimates are shown in Fig. 1. The 
abscissa of the solid square on each curve corresponds to the value predicted by Eq. (^), and 
the arrow heads indicate the values of (r) ± cr,-. It can be safely concluded that Eqs. @ 
and d^), computed from the simplified chain reactions of Eq. (|^), are good estimates of the 
mean rise-time and the rise-time width corresponding to the full set of reactions. 

The last important feature to be extracted from Fig. 1 is the tendency of p{t) towards 
an asymptote p^o, which corresponds to the equilibrium of the chemical system. This value 
is of importance in PGR, and thus it is worth computing it in terms of the parameters of 
our model. The detailed balance equilibrium conditions for the reactions of Eq. (|I]) demand 

that [ss]eq/[ho]eq = ^2/ {Ki[pr]eq) = «!, [hi]eq/ [qhi]eq = ^4/ (/tS [?] eg) = "3 (for < i < 
N - 1), [qhi]eq/{[qhi+i]eq) = K6/{K5[n]eq) = ^5 (for < i < iV - 1) and [hN]eq/[qhN]eq = 

f^8/if^7[q]eq) = "7- On usiug Eq. (0) and the conservation relation [ss](t) + I]^o[^«](^) + 
[qhi]{t) = [ss](0), one obtains that 

^ 1 + Q7 (f.. 

1 + ^7 + cuasa^" + ^^(^5 ^ - as) 

For the purpose of computing Poo one should know the values of [pr]eq, [n]eq and [q]eq- 
As an approximation to these values one can use the initial values of these species at the 



beginning of the cycle. This approximation will be excellent if this initial concentrations are 
sufficiently large. The values of poo (computed under this approximation) corresponding to 
the conditions of the simulations of Fig. 1 are 0.87 for = 1 and 0.85 for = 10 and 
N = 45, in perfect agreement with the complete simulation. It is interesting to notice that 
from direct measurements of p(t) a wealth of information on the rate constants involved in 
the polymerization reactions can be inferred using Eqs. (0)-®. 

Of utmost importance in applications of PGR is the number of cycles of PGR during 
which the amplifying process is exponential. As will be discussed later on, the mean number 
of molecules (A'fc+i) at cycle A; + 1 is related to the mean number of molecules (A'fc) at 
cycle k by the relation (N^^i) = (1 + Pk){Nk) , where pk is the efficiency during the k-th 
cycle. Therefore the rate of growth will be exponential only when p^ is independent of k. 
During how many cycles can the system maintain pk constant? The answer can be found 
if we think that during these cycles, both the concentration of primers and nucleotides 
will also be decreasing exponentially, and therefore their concentration at cycle k will be 
[pi^]k = [pr]Q — (1 + p)''[ss]o and [n]k = [n]o — (1 + p)''N[ss]q. The mean rise-time and 
rise-time width for p at cycle k, (r)^ and ar^k, will be given by Eqs. and (^), with [pr] 
and [n] replaced by [pr]^ and [n]k respectively. If the time for the reaction is t, then the 
maximum number of cycles u during which p^ can be considered constant will be given, to 
a first approximation, by the u that verifies that (r)^ + Cr.u = t. This imposes an equation 
for u that can be solved numerically. An approximation to this solution is 



where logt indicates logarithm to the base b. Notice that as t becomes larger, the value 
of u predicted in Eq. tends to a constant independent of t, given by the number of 
cycles that it takes to deplete the solution of nucleotides or primers, whichever is exhausted 
first. Although it might be unrealistic for the conditions used in molecular biology, it is 
interesting to notice that if the nucleotides are the first species to be exhausted, then most 
of the heteroduplexes will cease to polymerize before reaching the end, with the outcome 
that there will hardly be any complete double helix formed: in this case the net amplification 
factor will be close to zero. Figure 2 shows the efficiency pk at cycle /c as a function of the 





7 



number of cycles, for different times of polymerization and = 45 (the other parameters 
are as in Fig. 1), obtained by the integration of the reactions of Eq. (^. In this simulation 
we concatenated cycles assuming a perfect melting step, which was done by hand by setting 
[ss]fe+i(0) in the cycle k + 1 equal to [ss]fc(0) + [hN]k(t) + [qh^jkit) of the previous cycle 
and [/ij]/c4-i(0) = [qhi]k+i{0) = 0, {0 < i < N). The dynamics of pr and n, on the other 
hand, was followed exactly. It is clear from Fig. 2 that there is a regime for which is 
roughly constant, and that the extent of this regime tends to decrease with t. The values 
of z/ predicted by the condition {t)^ + = t are 13 for t = 0.8 sec and 15 for t = 2.0 sec 
[slightly overestimated by Eq. (|^), whose integer part yields 14 for t = 0.8 sec and 15 for 
t = 2.0 sec], in rough agreement with the values of about 12 and 14 respectively obtained 
from Fig. 2. 

At this point a few important considerations are in order. The fraction 1 — p of molecules 
whose replication was incomplete will give rise to incomplete complementary single strands. 
Only when these incomplete replicas are close to completion will they be able to bind a primer 
in the next cycle, and thus be replicated. Therefore the efficiency p defined in Eq. (j^) is 
an underestimation, since /iiv_i, hiy_2, . . ., h^_j as well as qh^-i, qhN-2, ■ ■ qhN-j (for 
some j < Lp, where Lp is the length of the primers), will be part of the pool of templates 
in subsequent cycles. However, the dominant process will be the replication of the complete 
strand, which justifies the computation of p as in Eq. (^. There is another issue that needs 
some discussion. All the complementary strands arising from both complete and incomplete 
replication of a template can anneal to that template in subsequent cycles, and therefore 
can act effectively as primers. Strictly speaking, at any given cycle k > 1 there will be a 
pool of primers of different lengths. An estimate of the concentration of "primers" arising 
from incomplete replication at cycle k is ^^(1 + p)'^[ss](0). This amount is always smaller 
than the concentration (1 +p)^[ss](0) of completely replicated single strands which act also 
as potential "primers". As long as the concentration of incomplete replicas remain much 
smaller than the concentration of primers [pr] , Eqs. (|^) will constitute a good approximation 
to the PGR process. Recall now that u [see Eq. (|^] is equal or smaller than the number of 
cycles required for the concentration of primers [pr] to match the concentration of completely 
replicated single stranded molecules. It follows that the approximation given by Eqs. (|l]) 
will break down only after the number of PGR cycles is bigger than u and therefore our 
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basic conclusions, contained in Eqs. (|D-(0), are not altered. 



III. STATISTICAL ANALYSIS 



As seen above, the efficiency p can be assumed to be constant for a number of cycles 
of PGR. The statistics of PGR can be readily computed under this assumption. The basic 
element in the analysis is the recursive relation that links the number of replicates after 
cycle number n + 1, Nn+i in terms of Nn, 



n+l 



Nn + B{Nn,p) 



(8) 



where B{Nn]p) is a random variable whose distribution is binomial with parameters 
and p. The basis for this relation is that at the n + 1-th cycle, there will be not only the 
Nn molecules that were present at the previous cycle, but also the number of successful 



replication after Nn Bernoulli trials each one with probability p of success. The number 
of molecules in the initial sample will be denoted by Mq. 

The ffist moments of Nn can be easily computed from Eq. (H): 



fXn={Nn) = Moil+py 



{{Nn-fIn?)=Mo 



p 



1+P 



(9) 



(10) 



Furthermore, using the theory of branching processes |T^,IB[, a recursive relation between 
P^^°{k) (the probability that there are k molecules at cycle n, having started with Mq of 
them) and Pn-i{k) can be obtained 



Jmax 



3 

k-j 



(11) 



(where jmax = min{Mo2"'~^, k}, and [|] denotes the integer part of |), which when supple- 
mented with the initial condition P^°{k) = 6k,Mo allows us to compute Pn°{k) for any n. 
Figure 3 shows the form of these probability functions for n = 10 with Mq = 1 in Fig. 3(a), 
and Mo = 50 in Fig. 3(b), and different values of p. A remarkable resonance- like behavior 
can be observed in the curve corresponding to p = 0.9 and Mq = 1 [wavy curve in Fig. 
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3(a)]. This phenomenon originates in the discrete nature of the process: if at the first cycle 
the system fails in replicating the only original template, then the subsequent growth of the 
population will be as if there were nine cycles instead of ten. The other peaks correspond to 
the failure in replication in the first two cycles, three cycles, etc. This trait is characteristic 
of values of p between say 0.8 to 1. For smaller values of p the function looks smoother. 
A common feature of the curves in Fig. 3(a) is the existence of a power law regime in the 
region of small A'^, whose origin will be discussed later on. The behavior of the curves with 
Mq = 50 is simpler: they are basically Gaussian curves, with a mean that increases with p 
and a variance that first increases and then decreases with p [see Eqs. (P) and (p^O])]. 



In order to understand the features described above, it is convenient to use the formalism 
of generating functions jl^. The generating function of P^^"(A'„) is simply gn,Mo{s) = {s^"). 



Using Eq. (^ it is clear that gi^i = (1 ~p)s + ps^. It can be shown that for a branching 
process 

9n,Mo{s) = 9n^l,Mo[9l,l{s)] = ■.■= [^fl (s)]*^" , (12) 

where we have denoted by gi"i{s) the n-th composition of gi,i{s) with itself, and used that 
go,Mo{s) = s^'^° in the last equality. To proceed, we use the formalism of characteristic 
functions. The characteristic function (j)n,Mo{^) of the distribution of Nn having started 
with Mo molecules, which is by definition the Fourier transform of P^^'^{Nn) [Q, is simply 



0n,Afo(^) = fl'n,Mo(e*'^). In tcrms of the characteristic functions, Eq. (|T2D implies that 

0n,A/o(^^) = [<Pn,lii^)]^'° ■ (13) 

The characteristic function of the sum of Mq independent random variables is simply the 
product of the characteristic functions of each of them. Therefore, the physical interpretation 
of the last equation is that the amplification cascades produced by each of the Mq original 
molecules proceed independently, without interaction. From this observation and the central 
limit theorem it follows that as the number of molecules Mq becomes larger, the distribution 
of Nn tends to a Gaussian. This explains the observed features of the pdfs of Fig. 3(b). 

The behavior of the pdfs for finite Mq in the limit of n ^ oo is a little bit more interesting. 
In fact, it is clear from Eq. (|13|) that it suffices to study the case Mq = 1, which we do next. 
We should stress that our study of the asymptotically large n regime does not aim at 
understanding the behavior of PGR when infinitely many cycles are performed. In fact we 

10 



have shown in the previous Section that the efficiency can be considered as constant only for 
a finite number of cycles. Rather, the reason for studying this asymptotic regime is that the 
convergence of the finite n case to the n ^ oo case is fast enough that many of the features 
arising for finite n are well explained by the study of asymptotically large ra, most notably 
the power law behavior of the low regime of Fig. 3(a). It follows from Eq. (|l^) that 
9n,i{s) = gi,i[gn-i,i{s)], which in terms of the characteristic functions and of the explicit 
expression for gi,i{s) becomes 

(pnAuj) = (1 - p)0„_i,i(cj) + p[0„_i,i(cj)]2. (14) 

Given that we are going to consider the limit of n ^ oo and (A'^^) = (1 +p)" [see Eq. (^] 
diverges in this limit, it is convenient to use the random variable Nn = Nn/{Nn). Denote 
by 9n,i{uj) its characteristic function. It is easy to show that 6n,i{uj) = 4>n,i{-(i^^) , which 
on using Eq. (|1^) yields 

^„,i(a;) = (l-p)9^_,^,(-^)+p[9^_,^,(-^)]^. (15) 

Notice that Eq. (|l|) can be thought of as a dynamical system, that maps the point Zn to 
Zn+i = f{zn) = {l—p)zn+pz'^. The fuuctiou 6'„ i(u;) (— oo < UJ < oo) parametrizes a curve in 
the complex plane. In fact, the initial condition Mq = 1 determines that 9q^i{uj) = e**^, which 
parametrizes the unit circle Co- Subsequent applications of the map f{z) to Co produces the 
new curves (i, (2, • • •, which are parameterized respectively by 6'i^i(co'), 6*2, 1(0;), .... The 
study of the limiting behavior of the pdf of Nn is thus associated with the study of the 
invariant curves of the map /. Notice that the map / has only two fixed points, one at 
z = (stable) and one at z = 1 (unstable). Upon iteration, all the infinitesimally small 
straight lines with slope A passing through the repelling point z = 1 will generate a curve Cx 
which is invariant under /, that is f{C\) = C\. On the other hand, for any z ^ 1 such that 
\z\ < 1, \f{z)\ < \z\. Therefore the dynamics of this map brings all the points of Co (except 
for 2; = 1) to the origin. In the neighborhood of 2 = 1, Co is locally a straight line with 
slope A = 00, which upon evolution will become the invariant manifold Coo- It follows that 
Coo coincides with Coo, and ^oo.i parameterizes the invariant manifold of the map /, that 
crosses z = 1 parallel to the imaginary axis. Figure 4(a) shows half the invariant manifold 
Coo corresponding to p = 0.9 (the other half is its complex conjugate), and on the same plot 
the imaginary part vs the real part of 6*15^1 (tu) (for positive u). To the level of resolution of 
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the figure no departures between the two curves are observed, meaning that the pdf of the 

number of molecules at 15 PGR cycles is well approximated by the limiting pdf. 

This dynamical-system way of looking at the characteristic function of Nn is very useful 

to understand the power law behavior of the pdfs of A^^^. The argument goes as follows. 

Close to 2 = (or equivalently, for large values of u) the quadratic terms in Eq. ([15|) can be 

neglected, and the resulting approximate relation, 6^00, i(c^) = (1 — p)6'oo,i(c^/(l accepts 

iii(i-p) 

as a solution the ansatz 6'oo,i(i-i^) ~ A(lnci;)ci;i"(i+p) , where A{x) is in principle any periodic 
function with period ln(l +p). The large uj behavior of the characteristic function 6'oo,i(i^) 
is then a power law, with logarithmically periodic modulations. That this is so is shown in 
Fig. 4(b), where we have plotted the absolute value of ^151(^7) for p = 0.9. The power law 
corresponding to the predicted scaling exponent of ]||||~^| is shown as the straight line close 
to the curve in the log-log plots of Fig. 4(b). The implication of this results for the pdfs 
can be readily drawn. Recalling that the characteristic function and pdf are related through 
a Fourier transform, and that the Fourier transform of \uj\'^ (with an appropriate infrared 
cut-off) scales as we conclude that the pdf of A^^ should exhibit a scaling of the form 

P^{Nn) ~ Nn '"*^+''' . This scaling law is shown as the straight lines close to the curves 
plotted in log- log scale in Fig. 3(a). 

In the following Section we apply some of the results presented so far to the problem of 
quantitative PGR. 



IV. QUANTITATIVE PGR 

Although PGR is used mainly in a qualitative fashion, its potential for becoming an 



important tool in nucleic acid quantification in general IT3, and in medical research in 



particular |jT8[ has become clear in recent years. By quantitative PGR one means the use of 



the PGR to measure an unknown initial number of molecules Mq. A few techniques have 
been developed to that effect in the past, but the most widespread is probably the so-called 
competitive PGR (see, e.g., Ijl9[)- In this technique, the target, whose initial concentration is 
unknown, is amplified simultaneously with a standard, which is flanked by the same primers 
as the target and whose initial concentration is known. The standard should have a length 
different from that of the target, so that both can be resolved in an electrophoretic gel. The 
basic idea in competitive PGR is that if the efficiencies of replication of the target and the 
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standard are the same then the ratio of the concentration of target to that of the standard is 
constant in the reaction. Measuring that ratio at cycle n (where presumably we have enough 
concentration to use densitometric measurements) we can solve for the initial concentration 
of target. While this technique is very attractive, the basic assumption (the equality of the 
efficiencies in both species) has some drawbacks . Basically, the potential problems arise 
in the dependence of the efficiency on the length of the DNA molecule. The longer molecule 
will experience a decrease in efficiency before the shorter one does, as predicted in Eq. (^. 
In any case the model presented here can be of use to assess the validity of the assumptions 
that go in the basics of competitive PGR. 

In order for competitive PGR to work, the length of the standards have to be within a 
narrow window: it has to be sufficiently different from the length of the target molecules (to 
be resolved in a gel) and sufficiently similar to it in order for the equal efficiency assumption 
to work. The design of a good standard requires some ingenuity, and has to be done on a 
case by case basis. In what follows we will present a design for measuring Mq without the 
need of a standard. Suppose we measure the concentration of a given DNA molecule after 
a number of PGR cycles on a sample whose Mq is unknown. One might think that if we 
repeated the same measurement for a reasonable number of times (say around 100 times, 
given that PGR equipment with capacity for 96 vials are not uncommon), so as to measure 
the mean value and the variance of the concentration across that number of experiments, 
we would have two equations [Eqs. (^) and (p!0|)] that can be solved for the two unknowns p 
and Mq. However, it can be shown that this procedure always yields two possible solutions 
for p and Mq, and there is no possible way a priori, of choosing the right one. The reason 
for this is that for Mq bigger than a few hundreds (which is nonetheless a small number of 
molecules), the distribution of is Gaussian, and therefore determined only by the mean 
and the variance, which give the above mentioned ambiguous answer. 

Gonsider instead the following scheme. We prepare two sets of samples 5*1 and 5*2, each 
with K identical preparations and whose initial concentration of a given double-stranded 
DNA molecule is unknown. We run (under conditions for which p can be considered ap- 
proximately constant from cycle to cycle) ni cycles of PGR on set Si, and n2 cycles on set 
5*2, after which we measure the number of molecules in every sample. The averages z/i and 
1^2 over the K preparations in Si and 5*2, are estimates of the ensemble averages and 



13 



fin2 corresponding to Eq. for n = rii and n = n2 respectively. We can use that formula 



to compute mo = i^i "i""2^^"i-"2 estimate of the real Mq and p) = Vi^'"'^ 1^2 "^""^ — 1 

as an estimate of the real p. Of course these estimates make sense only if a measure of the 
error involved in the method is provided. It takes a simple calculation to show that, 



and 



(mo) ^ Mo; (p) ^ p, (16) 



_ ((mo - (mo))^) _ 1 1-p nj + nl 

{mof ^ MoKl+pin.-n^r ^ ^ 

^ ^ {jp-ip)?) ^ 2 



In writing the last two equations we used Eq. (|10D . We tested these expressions in a set of 
very simple numerical simulations, whose details we are not going to report here except for 
saying that the PGR amplification was represented by the cascade given by Eq. (H). Under 
variations of all the parameters involved, Eq. (|l^)-(0) were in excellent agreement with 
the numerical results. To get a flavor of the precision of the method proposed, assume a 
simple example with Mq = 1000, p = 0.8, rii = 10, n2 = 15 and K = 50. Under these 
conditions the above equations predict that the estimate of Mq will be correct within 0.5% 
(that is ±5 molecules) and that of p will be correct within 0.1%!. These estimates refer 
to the purely statistical errors, and they will be fairly small under typical conditions. In 
real experiments they have to be supplemented with the errors involved in the measurement 
of the concentrations. If Mo and p fluctuated from sample to sample (due to inevitable 
differences in their preparations), the fact that we are averaging over K samples will screen 
these fluctuations. In this latter case, Eqs. (|16]) will still be in agreement with the average Mo 
and p, and Eqs. (0) and (0), which can be easily generalized to include these fluctuations, 
will give their right order of magnitude. 



V. SUMMARY 

We have presented a kinetic model for the PGR, which can be the basis for a more 
accurate application of quantitative techniques, as it provides a dynamical account of the 
probability of replication as a function of the physical parameters involved. These include 
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the rate constants of the different reactions. Conversely, the model allows us to extract 
information on these rates from direct measurements of p. From a theoretical point of view, 
it can also be used in the description of in vivo and in vitro enzymatic polymerization 
processes |]22|. The statistical analysis of PGR under the assumption of constant replication 
probability shows new interesting phenomena. The scaling behavior of the pdf is an effect of 
the recursivity of the process, whereas the mult i- modality is related to failures in replication 
during the first cycles. Although the latter is a phenomenon present only for a small number 
of initial molecules, it is not far from actual experimental conditions, and might be of 
relevance in quantitative applications. 

Finally, we are using the statistical considerations of section IV to devise a method for 
measuring the initial number Mq of molecules in a sample (quantitative PGR) pl| . 

Acknowledgements: Many of the ideas in this paper are the product of long and fruitful 
discussions with P. Kaplan and M. Magnasco. We thank A. Libchaber and E. Mesri for 
a careful reading of the manuscript and useful discussions, and an anonymous referee for 
helpful suggestions. Support from the Mathers Foundation is gratefully acknowledged. 



[1] Mullis, K. B. & Faloona, F. A. (1987) Methods Enzymol. 155, 335-350. 
[2] Gibbs, R. A. (1990) Analytical Chemistry 62, 1202-1214. 

[3] Gestland, R. & Atkins, J. (eds.), (1993) The RNA world (Cold Spring Harbor Press). 
[4] Adleman, L. M. (1994) Science 266, 1021-1024. 

[5] Kaplan, P., Cecchi, G. & Libchaber, A. (1995) submitted to Science. 

[6] Capson, T.L., Paliska, J.A., Fenn Kaboord, B., West Frey, M., Lively, C., Dahlberg & M., 
Benkovic, S.J. (1992) Biochemistry 31, 10984-10994. 

[7] Patel, S.S., Wong, I. & Johnson, K.A. (1991) Biochemistry 30, 511-525. 



15 



[8] Kuchta, R. D., Mizrahi, V., Benkovic, P. A., Johnson, K. A. & Benkovic, S. J. (1987), Bio- 
chemistry 26 8410-8417. Dahlberg, M. E. & Benkovic, J. (1991), Biochemistry 30, 4835-4843. 
Kuchta, R. D., Mizrahi, V., Benkovic, P. A., Johnson, K. A. & Benkovic, S. J. (1987), Bio- 
chemistry 26 8410-8417. 

[9] Sun, F. (1995) J. Comp. Biol. 2, 63-86. 

[10] Weiss, G. & von Haeseler, A. (1995) J. Comp. Biol. 2, 49-61. 

[11] Sammbrook, J., Pritsch, E. P. & Maniatis, T. (1989) Molecular Cloning, a laboratory manual, 
2nd Ed. 

[12] Yin, H., Wang, M. D., Svoboda, K., Landick, R., Block, S. M. & Gelles, J. (1995) Science 
270, 1653-1657. 

[13] van Kampen, N. G. (1981) Stochastic processes in physics and chemistry (North-Holland, 
Amsterdam) . 

[14] Feller, W. (1968) An Introduction to Probability Theory and Its Applications, Vol. I, 3rd Ed. 
(Wiley, New York). 

[15] Harris, T. E. (1963) The theory of branching processes (Springer- Verlag, Berlin). 
[16] Athreya, K.B. & Ney, P.E. (1972) Branching Processes (Springer- Verlag, Berlin). 
[17] Ferre, P (1992) PCR Methods Applic. 2, 1-9. 

[18] Clementi, M., Menso S., Bagnarelh, P., Manzin, A., Valenza, A. & Varaldo, P. (1993) PCR 
Methods Applic. 2, 191-196. 

[19] Guilliland, G., Perrin, S., Blanchard, K. & Bumm, P. (1990) Proc. Natl. Acad. Sci. USA 87, 
2725-2729. 

[20] Raeymaekers, L. (1993) Annal Biochem. 214, 582-585. 

[21] Cecchi, G., Peygenson, D., Kaplan, P. k, Stolovitzky, G. (1996) to appear. 

16 



[22] McAdams, H. H. & Shapiro, L. (1995) Science 269, 650-656. 



17 



Figure Captions 



Figure 1. Probability of replication p{t) as a function of time t, for different template 
lengths (in number of nucleotides without including the primers), arising from a numerical 
simulation of Eqs. (|l]), with parameters: Hi = 10^M~^s~^, K2 = lO^^s"^, K3 = 10'^M~^s~^, 
= lO-^s-i, = WM-\s-\ kq = 15s-\ Kr = 10^M-h-\ Kg = lO'^s-\ [pr]{0) = 
10'^ M, [n]{0) = 10-^M, [g](0) = lO'^M, [ss]{0) = 10"^^. The square and arrow heads 
indicate, respectively, the mean rise-time and rise-time width as predicted by the simplified 
model of Eqs. (|). 

Figure 2. Efficiency pk as a function of the cycle number k, for different polymerization 
times. The length of the template is = 45; the other parameters are as in Fig. 1. 

Figure 3. (a) pdf of the number of molecules after n = 10 cycles and Mq = 1 of a branching 
process with constant efficiency p, in log-log scale. Notice the mult i- modality for p = 0.9, 
and the power law regimes (straight lines), (b) Same as in (a) for Mq = 50 (linear scale). 
The multi-modality has disappeared even for p = 0.99. 

Figure 4. (a) The invariant manifold that crosses z = 1 tangent to the unit disk, of the 
map Zn+i = (1 — p)zn + pz"^ (with z in the complex plane), for p = 0.9. It is parametrized 
by 6ao^i{uj). In the same plot the curve parametrized by ^15,1 (cu) is shown, and cannot be 
resolved from the invariant manifold, (b) Absolute value of 6'i5 i(ci;), and the predicted power 
law. 
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Figure 3(a) 



Figure 3(b) 
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